inventory management policy
Inventory Optimization Solution in the Azure AI Gallery
This post is co-authored by Dmitry Pechyoni, Senior Data Scientist, Hong Lu and Chenhui Hu, Data Scientists, Praneet Solanki, Software Engineer, and Ilan Reiter, Principal Data Scientist Manager at Microsoft. For retailers, inventory optimization is a critical task to facilitate production planning, cost reduction, and operation management. Tremendous business value can be derived by optimizing the inventories. For example, optimizing product ordering strategy can help minimize inventory cost and reduce stockout events. This also improves customer satisfaction levels.
On the Hardness of Inventory Management with Censored Demand Data
Lugosi, Gábor, Markakis, Mihalis G., Neu, Gergely
We consider a repeated newsvendor problem where the inventory manager has no prior information about the demand, and can access only censored/sales data. In analogy to multi-armed bandit problems, the manager needs to simultaneously "explore" and "exploit" with her inventory decisions, in order to minimize the cumulative cost. We make no probabilistic assumptions---importantly, independence or time stationarity---regarding the mechanism that creates the demand sequence. Our goal is to shed light on the hardness of the problem, and to develop policies that perform well with respect to the regret criterion, that is, the difference between the cumulative cost of a policy and that of the best fixed action/static inventory decision in hindsight, uniformly over all feasible demand sequences. We show that a simple randomized policy, termed the Exponentially Weighted Forecaster, combined with a carefully designed cost estimator, achieves optimal scaling of the expected regret (up to logarithmic factors) with respect to all three key primitives: the number of time periods, the number of inventory decisions available, and the demand support. Through this result, we derive an important insight: the benefit from "information stalking" as well as the cost of censoring are both negligible in this dynamic learning problem, at least with respect to the regret criterion. Furthermore, we modify the proposed policy in order to perform well in terms of the tracking regret, that is, using as benchmark the best sequence of inventory decisions that switches a limited number of times. Numerical experiments suggest that the proposed approach outperforms existing ones (that are tailored to, or facilitated by, time stationarity) on nonstationary demand models. Finally, we extend the proposed approach and its analysis to a "combinatorial" version of the repeated newsvendor problem.